Cost-Sensitive Decision Tree Learning for Forensic Classification
نویسندگان
چکیده
In some learning settings, the cost of acquiring features for classification must be paid up front, before the classifier is evaluated. In this paper, we introduce the forensic classification problem and present a new algorithm for building decision trees that maximizes classification accuracy while minimizing total feature costs. By expressing the ID3 decision tree algorithm in an information theoretic context, we derive our algorithm from a well-formulated problem objective. We evaluate our algorithm across several datasets and show that, for a given level of accuracy, our algorithm builds cheaper trees than existing methods. Finally, we apply our algorithm to a real-world system, CLARIFY. CLARIFY classifies unknown or unexpected program errors by collecting statistics during program runtime which are then used for decision tree classification after an error has occurred. We demonstrate that if the classifier used by the CLARIFY system is trained with our algorithm, the computational overhead (equivalently, total feature costs) can decrease by many orders of magnitude with only a slight (< 1%) reduction in classification accuracy.
منابع مشابه
Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملResearch on Dynamic Cost-sensitive Decision Tree for Mining Uncertain Data Based on the Genetic Algorithm
The existing classifiers for uncertain data don’t consider the dynamic cost, so this paper proposes the classification approach of the dynamic cost-sensitive decision tree for uncertain data based on the genetic algorithm (GDCDTU) , which overcomes the limitations of the stationary cost, and searches automatically the suitable cost space of every sub datasets. Firstly, this paper gives the dyna...
متن کاملLearning cost-sensitive Bayesian networks via direct and indirect methods
Cost-sensitive learning has become an increasingly important area that recognizes that real world classification problems need to take the costs of misclassification and accuracy into account. Much work has been done on cost-sensitive decision tree learning, but very little has been done on cost-sensitive Bayesian networks. Although there has been significant research on Bayesian networks there...
متن کاملمقایسه روشهای مختلف یادگیری ماشین در تشخیص پرفشاری خون در بیماران دیابتی با و بدون در نظر گرفتن هزینهها
Background and Objectives: Diabetic patients are always at risk of hypertension. In this paper, the main goal was to design a native cost sensitive model for the diagnosis of hypertension among diabetics considering the prior probabilities. Methods: In this paper, we tried to design a cost sensitive model for the diagnosis of hypertension in diabetic patients, considering the distribution of...
متن کاملA New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate
Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006